GeneView – Gene-Centric Ranking of Biomedical Text
نویسندگان
چکیده
Background: Life scientists spend a great amount of time searching for gene-specific information. It is widely acknowledged that research results are primarily published in scientific literature and current curation efforts can not keep up with the fast increase of such literature. It can therefore be estimated that the plethora of gene-specific knowledge is still hidden in large text repositories like MEDLINE. Searching text data sources is difficult, as user queries are usually ambiguous and lead to hundreds of results. Faced with such a number of relevant publications, an appropriate article ranking is important. PubMed, for example, ranks articles per default by indexing date, making it difficult to find seminal papers about a specific topic. In this paper, we introduce GeneView, a genecentric text mining application capable of searching, ranking, and visualizing biomedical publications. Results: Our ranking algorithm relies on the assumption that the relevance of a gene for a specific article depends on the frequency with which it is mentioned and on the sections it appears in. For ranking we introduce a simple evaluation strategy by using the NCBI Gene2Pubmed mapping as gold-standard. This strategy is used to evaluate different section specific rankers, where the best one achieves on average a precision of 75.5 %.The evaluation further confirms our expectations, that sections like title, abstract and result are more relevant for gene specific ranking than others. Surprisingly, incorporation of figureand table-captions decreased the quality of ranking results.
منابع مشابه
Community curation for GeneView
1 Motivation The latest discoveries of diseases and their diagnosis or treatments have been mostly published in scientic literature. The fast growth of published biomedical articles led to a strong ambiguity of disease names meaning a traditional keyword-based search for biomedical articles will not lead to satisfying results [DL12]. This problem does not only exist for the terms of diseases, i...
متن کاملGeneView: a comprehensive semantic search engine for PubMed
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects s...
متن کاملExperiences from Developing the Domain-Specific Entity Search Engine GeneView
.GeneView is a semantic search engine for the Life Sciences. Unlike traditional search engines, GeneView searches indexed documents not only at the textual (syntactic) level, but analyzes texts upon import to recognize and properly handle biomedical entities, relationships between those entities, and the structure of documents. This allows for a number of advanced features required to work effe...
متن کاملAGRA: analysis of gene ranking algorithms
UNLABELLED Often, the most informative genes have to be selected from different gene sets and several computer gene ranking algorithms have been developed to cope with the problem. To help researchers decide which algorithm to use, we developed the analysis of gene ranking algorithms (AGRA) system that offers a novel technique for comparing ranked lists of genes. The most important feature of A...
متن کاملText Mining in Biograph
The Biograph project is a biomedical knowledge discovery project combining graph data mining with structured biomedical information and with text mining on medline abstracts. It is a cooperation between the molecular genetics, data mining, and computational linguistics research groups of the University of Antwerp. In this talk, I will outline the general architecture of the system, which is cur...
متن کامل